协作机器人将对家庭服务应用中的人类福利产生巨大影响,而高级制造业中的工业优势需要灵巧的组装。出色的挑战是为机器人指尖提供一种物理设计,使他们擅长执行需要高分辨率,校准形状重建和力传感的灵活任务。在这项工作中,我们提出了Densetact 2.0,这是一种能够可视化柔软指尖的变形表面并在神经网络中使用该图像来执行校准形状重建和6轴扳手估计的光学传感器。我们证明了用于形状重建的每个像素0.3633mm的传感器精度,0.410N的力量,扭矩为0.387mmnm,以及通过转移学习来校准新手指的能力,实现了可比性的性能,训练了四倍以上,只有12%以上数据集大小。
translated by 谷歌翻译
增加机器人触觉感应的性能使多功能,手动操纵能够。视觉的触觉传感器已被广泛使用,因为富有的触觉反馈已被证明与操作任务的性能增加相关。具有高分辨率的现有触觉传感器解决方案具有包括低精度,昂贵的组件或缺乏可扩展性的限制。在本文中,提出了具有用于3D传感器表面的高分辨率表面变形建模的廉价,可伸缩和紧凑的触觉传感器。通过测量来自Fisheye相机的图像,表明传感器可以通过使用深卷积神经网络成功地估计实时(1.8ms)的表面变形。该传感器在其设计和传感能力中表示通过高分辨率形状重建实现更好的对象的携手局部定位,分类和表面估计的重要一步。
translated by 谷歌翻译
可靠的交通流量预测对于创建智能运输系统至关重要。已经开发出许多基于大数据的预测方法,但他们在考虑时间和地点的道路之间没有反映复杂的动态相互作用。在这项研究中,我们提出了一种动态定位的长短期记忆(LSTM)模型,涉及道路之间的空间和时间依赖。为此,我们使用局部动态空间权重矩阵以及其动态变化。此外,LSTM模型可以处理具有长依赖性的顺序数据以及复杂的非线性功能。经验结果表明,与两种不同的基线方法相比,所提出的模型的卓越预测性能。
translated by 谷歌翻译
在偏置数据集上培训的分类模型通常在分发外部的外部样本上表现不佳,因为偏置的表示嵌入到模型中。最近,已经提出了各种脱叠方法来解除偏见的表示,但仅丢弃偏见的特征是具有挑战性的,而不会改变其他相关信息。在本文中,我们提出了一种新的扩展方法,该方法使用不同标记图像的纹理表示明确地生成附加图像来放大训练数据集,并在训练分类器时减轻偏差效果。每个新的生成图像包含来自源图像的类似内容信息,同时从具有不同标签的目标图像传送纹理。我们的模型包括纹理共发生损耗,该损耗确定生成的图像的纹理是否与目标的纹理类似,以及确定所生成和源图像之间的内容细节是否保留的内容细节的空间自相似性丢失。生成和原始训练图像都进一步用于训练能够改善抗偏置表示的鲁棒性的分类器。我们使用具有已知偏差的五个不同的人工设计数据集来展示我们的方法缓解偏差信息的能力。对于所有情况,我们的方法表现优于现有的现有最先进的方法。代码可用:https://github.com/myeongkyunkang/i2i4debias
translated by 谷歌翻译
我们介绍韩语了解评估(KLUE)基准。 Klue是8个韩国自然语言理解(nlu)任务的集合,包括主题分类,语言典的相似性,自然语言推断,命名实体识别,关系提取,依赖解析,机器阅读理解和对话状态跟踪。我们从各种源语料库中展开的所有任务,同时尊重版权,以确保任何没有任何限制的人的可访问性。考虑到道德考虑,我们仔细设计了注释协议。随着基准任务和数据,我们为每个任务提供适用的评估指标和微调配方,为每项任务进行预训练语言模型。我们还释放了预用的语言模型(PLM),Klue-Bert和Klue-Roberta,以帮助在KLUE上再现基线模型,从而促进未来的研究。我们通过拟议的Klue基准套件从初步实验中进行了一些有趣的观察,已经证明了这款新的基准套件的有用性。首先,我们找到了klue-roberta-mantring的其他基线,包括多语种plms和现有的开源韩国plms。其次,即使我们从预先预测语料库中取代个人身份信息,我们也会看到性能下降最小,这表明隐私和NLU能力并不彼此可能。最后,我们发现,使用BPE标记与语素级预象的组合,在涉及语素级标记,检测和发电的任务中是有效的。除了加速韩国人NLP研究外,我们的创建Klue的全面文件将有助于将来为其他语言创建类似的资源。 klue在https://klue-benchmark.com上提供。
translated by 谷歌翻译
Many recent works on understanding deep learning try to quantify how much individual data instances influence the optimization and generalization of a model, either by analyzing the behavior of the model during training or by measuring the performance gap of the model when the instance is removed from the dataset. Such approaches reveal characteristics and importance of individual instances, which may provide useful information in diagnosing and improving deep learning. However, most of the existing works on data valuation require actual training of a model, which often demands high-computational cost. In this paper, we provide a training-free data valuation score, called complexity-gap score, which is a data-centric score to quantify the influence of individual instances in generalization of two-layer overparameterized neural networks. The proposed score can quantify irregularity of the instances and measure how much each data instance contributes in the total movement of the network parameters during training. We theoretically analyze and empirically demonstrate the effectiveness of the complexity-gap score in finding 'irregular or mislabeled' data instances, and also provide applications of the score in analyzing datasets and diagnosing training dynamics.
translated by 谷歌翻译
Most existing text-video retrieval methods focus on cross-modal matching between the visual content of offline videos and textual query sentences. However, in real scenarios, online videos are frequently accompanied by relevant text information such as titles, tags, and even subtitles, which can be utilized to match textual queries. This inspires us to generate associated captions from offline videos to help with existing text-video retrieval methods. To do so, we propose to use the zero-shot video captioner with knowledge of pre-trained web-scale models (e.g., CLIP and GPT-2) to generate captions for offline videos without any training. Given the captions, one question naturally arises: what can auxiliary captions do for text-video retrieval? In this paper, we present a novel framework Cap4Video, which makes use of captions from three aspects: i) Input data: The video and captions can form new video-caption pairs as data augmentation for training. ii) Feature interaction: We perform feature interaction between video and caption to yield enhanced video representations. iii) Output score: The Query-Caption matching branch can be complementary to the original Query-Video matching branch for text-video retrieval. We conduct thorough ablation studies to demonstrate the effectiveness of our method. Without any post-processing, our Cap4Video achieves state-of-the-art performance on MSR-VTT (51.4%), VATEX (66.6%), MSVD (51.8%), and DiDeMo (52.0%).
translated by 谷歌翻译
According to the rapid development of drone technologies, drones are widely used in many applications including military domains. In this paper, a novel situation-aware DRL- based autonomous nonlinear drone mobility control algorithm in cyber-physical loitering munition applications. On the battlefield, the design of DRL-based autonomous control algorithm is not straightforward because real-world data gathering is generally not available. Therefore, the approach in this paper is that cyber-physical virtual environment is constructed with Unity environment. Based on the virtual cyber-physical battlefield scenarios, a DRL-based automated nonlinear drone mobility control algorithm can be designed, evaluated, and visualized. Moreover, many obstacles exist which is harmful for linear trajectory control in real-world battlefield scenarios. Thus, our proposed autonomous nonlinear drone mobility control algorithm utilizes situation-aware components those are implemented with a Raycast function in Unity virtual scenarios. Based on the gathered situation-aware information, the drone can autonomously and nonlinearly adjust its trajectory during flight. Therefore, this approach is obviously beneficial for avoiding obstacles in obstacle-deployed battlefields. Our visualization-based performance evaluation shows that the proposed algorithm is superior from the other linear mobility control algorithms.
translated by 谷歌翻译
The problem of detecting the Out-of-Distribution (OoD) inputs is of paramount importance for Deep Neural Networks. It has been previously shown that even Deep Generative Models that allow estimating the density of the inputs may not be reliable and often tend to make over-confident predictions for OoDs, assigning to them a higher density than to the in-distribution data. This over-confidence in a single model can be potentially mitigated with Bayesian inference over the model parameters that take into account epistemic uncertainty. This paper investigates three approaches to Bayesian inference: stochastic gradient Markov chain Monte Carlo, Bayes by Backpropagation, and Stochastic Weight Averaging-Gaussian. The inference is implemented over the weights of the deep neural networks that parameterize the likelihood of the Variational Autoencoder. We empirically evaluate the approaches against several benchmarks that are often used for OoD detection: estimation of the marginal likelihood utilizing sampled model ensemble, typicality test, disagreement score, and Watanabe-Akaike Information Criterion. Finally, we introduce two simple scores that demonstrate the state-of-the-art performance.
translated by 谷歌翻译
Crowdsourcing has emerged as an effective platform to label a large volume of data in a cost- and time-efficient manner. Most previous works have focused on designing an efficient algorithm to recover only the ground-truth labels of the data. In this paper, we consider multi-choice crowdsourced labeling with the goal of recovering not only the ground truth but also the most confusing answer and the confusion probability. The most confusing answer provides useful information about the task by revealing the most plausible answer other than the ground truth and how plausible it is. To theoretically analyze such scenarios, we propose a model where there are top-two plausible answers for each task, distinguished from the rest of choices. Task difficulty is quantified by the confusion probability between the top two, and worker reliability is quantified by the probability of giving an answer among the top two. Under this model, we propose a two-stage inference algorithm to infer the top-two answers as well as the confusion probability. We show that our algorithm achieves the minimax optimal convergence rate. We conduct both synthetic and real-data experiments and demonstrate that our algorithm outperforms other recent algorithms. We also show the applicability of our algorithms in inferring the difficulty of tasks and training neural networks with the soft labels composed of the top-two most plausible classes.
translated by 谷歌翻译